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Human coronaviruses {HCV) are ubiquitous pathogens which cause respiratory, gastrointestinal, and possibly neuro¬ 
logical disorders. To better understand the molecular biology of the prototype HCV-229E strain, the complete nucleo¬ 
tide sequence of the membrane protein (M) gene was determined from cloned cDNA. The open reading frame is pre¬ 
ceded by a consensus transcriptional initiation sequence UCUAAACU, identical to the one found upstream of the N 
gene. The M gene encodes a 225-amino acid polypeptide with a molecular weight (MW) of 25,822, slightly higher than 
the apparent MW of 19,000-22,000 observed for the unprocessed M protein obtained after in vitro translation and 
immunoprecipitation. The M amino acid sequence presents a significant degree of homology (38%} with its counterpart 
of transmissible gastroenteritis coronavirus (TGEV). The M protein of HCV-229E is highly hydrophobic and its hydro- 
pathicity profile shows a transmem bra nous region composed of three major hydrophobic domains characteristic of a 
typical coronavirus M protein. About 10% (20 amino acids) of the HCV-229E M protein constitutes a hydrophilic and 
probably external portion. One /V-glycosylation and three potential O-glycosylation sites are found in this exposed 
domain. © 1990 Academic Presa. Inc. 


Human coronaviruses (HCV) belong to either one of 
two antigenic groups, represented by the prototype 
strains 229E and OC43 ( 1). They are responsible for as 
much as 25% of common colds (2, 3) and have been 
associated with gastrointestinal disorders (4). Their 
possible involvement in neurological diseases was 
suggested by the observation of coronavirus-like parti¬ 
cles in the brain of one multiple sclerosis (MS) patient 
(5), the isolation of coronaviruses from two MS brain 
tissues passaged in mice (6), and the detection of in¬ 
trathecal antibodies to HCV-OC43 and HCV-229E in 
MS patients (7). However, the association of human 
coronaviruses with neurological diseases has not yet 
been confirmed. 

HCV-229E possesses a single-stranded, positive- 
sense RNA genome with a molecular weight of 5.8 
x 10 € and a poly(A) tail of about 70 nucleotides at the 
3' end (S). As with other coronaviruses, six subgenomic 
RNAs are synthesized in infected cells (9). These ap¬ 
pear to have lower molecular weights than viral RNAs 
synthesized in cells infected with murine hepatitis virus 
(MHV). At least four polypeptides have been found in 
purified HCV-229E virions: 160- to 200-kDa and 88- to 
105-kDa glycoproteins which may be analogous to the 


1 To whom requests for reprints should be addressed. 


spike glycoprotein S (previously designated E2) of MHV 
(70); a 47- to 53-kDa polypeptide corresponding to the 
nucleocapsid protein N and a 17- to 26-kDa M protein 
(previously designated El) observed in both glycosyl¬ 
ated and nonglycosylated forms (11-14). One author 
also reported glycoproteins of 31 and 65 kDa (7 7). 

The nucleotide sequence of the genes encoding the 
nucleocapsid proteins as well as the mRNA leader se¬ 
quences of HCV-229E and HCV-OC43 have recently 
been determined (15, 16). As a continuation of these 
studies, we report the nucleotide sequence of the gene 
encoding the membrane protein M of HCV-229E. Its 
predicted amino acid sequence is compared with se¬ 
quences determined for other coronaviruses. 

Clones containing the sequence of the M protein 
gene were obtained from a cDNA library constructed 
with mRNA isolated from HCV-229E-infected LI 32 
cells, and identified using a genome-specific probe 
(75). One clone, designated L8, was selected for se¬ 
quencing since it contained a large 3.6-kb insert over¬ 
lapping by 1.2 kb the 5’ end of the N protein gene. The 
remaining 2.4-kb fragment was excised from an inter¬ 
nal Pst\ site of clone L8 and subcloned into the pBlue- 
script II vector (Stratagene). Unidirectional deletions of 
the 2.4-kb insert were created using exonuclease III, 
mung bean nuclease, and deoxythionucleotide deriva¬ 
tives (Stratagene). The sequencing of both strands was 
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5'- CTACTAGTGTGTATTACAATAATTAAACTAACTAAGCTTTGTTTCACTTGCCATATGTTTTGTACTAGAACAATT 75 

TATGGCCGCATTAAAAATGTGTACCACATTTACCAATCATATATGCACATAGACCCTTTCCCTAAACGAGTTATTGATC 154 


TCTAAACTAAACGACA ATG TCA AAT GAC AAT TGT ACG GGT GAC ATT GTC ACC CAT TTG AAG AAT 218 






M 

S 

N 

D 

N 

C 

T 

G 

D 

I 

V 

T 

H 

L 

K 

N 

16 






★ 



• 


* 





★ 






TGG 

AAT 

TTT 

GGT 

TGG 

AAT 

GTT 

ATT 

CTA 

ACC 

ATA 

TTC 

ATT 

GTT 

ATT 

CTT 

CAG 

TTT 

GGA 

CAC 

278 

W 

N 

F 

G 

W 

N 

V 

I 

L 

T 

I 

F 

I 

V 

I 

L 

Q 

F 

G 

H 

36 

TAT 

AAA 

TAC 

TCC 

AGA 

TTG 

CTT 

TAT 

GGT 

TTG 

AAG 

ATG 

CTT 

GTA 

CTG 

TGG 

CTT 

CTT 

TGG 

CCA 

338 

Y 

K 

Y 

S 

R 

L 

L 

Y 

G 

L 

K 

M 

L 

V 

L 

W 

L 

L 

W 

P 

56 

CTC 

GTA 

CTT 

GCT 

TTG 

TCA 

ATC 

TTT 

GAC 

ACC 

TGG 

GCT 

AAT 

TGG 

GAT 

TCT 

AAT 

TGG 

GCC 

TTT 

398 

L 

V 

L 

A 

L 

S 

I 

F 

D 

T 

W 

A 

N 

U 

D 

S 

N 

W 

A 

F 

76 

GTT 

GCA 

TTT 

AGC 

CTT 

CTT 

ATG 

GCC 

GTA 

TCA 

ACA 

CTC 

GTT 

ATG 

TGG 

GTG 

ATG 

TAC 

TTC 

GCA 

458 

V 

A 

F 

s 

L 

L 

M 

A 

V 

S 

T 

L 

V 

M 

W 

V 

M 

Y 

F 

A 

96 

AAC 

AGT 

TTC 

AGA 

CTT 

TTC 

CGA 

CGT 

GCT 

CGA 

ACT 

TTT 

TGG 

GCA 

TGG 

AAT 

CCT 

GAG 

GTC 

AAT 

518 

N 

S 

F 

R 

L 

F 

R 

R 

A 

R 

T 

F 

W 

A 

W 

N 

P 

E 

V 

N 

116 

GCA 

ATC 

ACT 

GTC 

ACA 

ACC 

GTG 

TTG 

GGA 

CAG 

ACA 

TAC 

TAT 

CAA 

CCC 

ATT 

CAA 

CAA 

GCT 

CCA 

578 

A 

I 

T 

V 

T 

T 

V 

L 

G 

Q 

T 

Y 

Y 

Q 

P 

I 

Q 

Q 

A 

P 

136 

ACA 

GGC 

ATT 

ACT 

GTG 

ACC 

TTG 

CTG 

AGC 

GGC 

GTG 

CTT 

TAC 

GTT 

GAC 

GGA 

CAT 

AGA 

TTG 

GCT 

638 

T 

G 

I 

T 

V 

T 

L 

L 

S 

G 

V 

L 

Y 

V 

D 

G 

H 

R 

L 

A 

156 

TCA 

GGT 

GTT 

CAG 

GTT 

CAT 

AAC 

CTA 

CCT 

GAA 

TAC 

ATG 

ACA 

GTT 

GCC 

GTG 

CCG 

AGC 

ACT 

ACT 

698 

S 

G 

V 

Q 

V 

H 

N 

L 

P 

E 

Y 

M 

T 

V 

A 

V 

P 

S 

T 

T 

176 

ATA 

ATT 

TAT 

AGT 

AGA 

GTC 

GGA 

AGG 

TCC 

GTA 

AAT 

TCA 

CAA 

AAT 

AGC 

ACA 

GGC 

TGG 

GTT 

TTC 

758 

I 

1 

Y 

s. 

R 

V 

G 

R 

S 

V 

N 

S 

Q 

N 

• 

S 

T 

G 

W 

V 

F 

196 

TAC 

GTA 

CGA 

GTA 

AAA 

CAC 

GGT 

GAT 

TTT 

TCT 

GCA 

GTG 

AGC 

TCT 

CCC 

ATG 

AGC 

AAC 

ATG 

ACA 

818 

Y 

V 

R 

V 

K 

H 

G 

D 

F 

S 

A 

V 

S 

S 

P 

M 

S 

N 

M 

T 

216 

GAA 

AAC 

GAA 

AGA 

TTG 

CTT 

CAT 

TTT 

TTC 

TAA 

ACTGAACGAAAAG ATG 

i' 





864 

E 

N 

E 

R 

L 

L 

H 

F 

F 












225 


Fig. 1. Complete nucleotide sequence of the M protein gene of HCV-229E and its predicted amino acid sequence. The leader sequences are 
underlined and the potential /V-glycosylation (•) and O-glycosylation (★) sites at the putatively external N-terminus of the polypeptide are also 
indicated. Two other /V-glycosylation sites are found at the C-terminus of the protein. The nucleotide sequence from position 792 is from Ref. 
(15). 


performed by the plasmid sequencing technique (77), 
using T7 DNA polymerase. In vitro translation of 
poly(A) + mRNAs isolated from HCV-229E-infected 
LI 32 cells was carried out in order to determine the 
molecular mass of the unprocessed viral polypeptides. 

The complete nucleotide sequence of the M protein 
gene of HCV-229E and its predicted amino acid se¬ 
quence are presented in Fig. 1. The AUG codon is pre¬ 
ceded by the consensus intergenic sequence UCU- 


AAACU, which is identical to that upstream of the nu- 
cleocapsid protein-coding sequence (75/ and Fig. 1). 
This sequence is conserved among coronaviruses of 
various species and represents the binding site of the 
leader RNA which mediates a discontinuous transcrip¬ 
tion of mRNAs (18). The longest open reading frame 
extends from base 171 through base 848 and encodes 
a 225-amino acid polypeptide with a calculated molec¬ 
ular weight of 25,822. The products of in vitro transla- 
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Fig. 2. Immunoprecipitation of in vitro translation products from 
HCV-229E mRNAs. Poly(A) + mRNAs were translated in the presence 
of [ 35 S]methionine, using a rabbit reticulocyte lysate (Promega Bio¬ 
tec). The viral polypeptides were immunoprecipitated and separated 
by SDS-PAGE (l 3% acrylamide). Lane 1. molecular mass standards; 
lane 2, mRNAs from HCV-229E-infected cells; lane 3. mRNAs from 
noninfected cells; lane 4, translation without exogenous mRNA. Mo¬ 
lecular mass standards (kDa) are indicated on the left. The calculated 
molecular masses of relevant viral proteins (kDa) are indicated on the 
right. 


tion of poly(A) + mRNAs from HCV-229E-infected cells 
were precipitated with a polyclonal antiserum prepared 
against purified HCV-229E virions. As shown in Fig. 2, 
six viral polypeptides were observed, which migrated 
with apparent molecular masses of 98, 58, 42, 28.5, 
19. and 14 kDa, respectively. Although the identity of 
these proteins has not been firmly established, by com¬ 
paring with other coronaviruses, p98 probably corre¬ 
sponds to S, p58 to N, and pi 9 to M. The nature of p42 
and p28.5 is not known at this time. Thus, the molecu¬ 
lar mass of M predicted from the nucleotide sequence 
is slightly higher than the molecular mass estimated by 
SDS-PAGE. Other studies have shown that the mature 


M protein has a molecular mass of 23- to 26-kDa ( 12- 
14) and that virions also incorporate a nonglycosylated 
20- to 22-kDa precursor of the M protein ( 12, 14). The 
latter observation is consistent with the identification 
of in vitro translated pi 9 as M. The lower apparent mo¬ 
lecular mass of M estimated by SDS-PAGE is consis¬ 
tent with the unusual electrophoretic behavior of this 
and other hydrophobic proteins, as was observed for 
MHV (19). 

Like TGEV {20), there are three amino acid se¬ 
quences characteristic of /V-glycosylation sites in the 
predicted M protein sequence (Asn-5; Asn-190; and 
Asn-214), although only one (Asn-5) is found near the 
N-terminus, as compared to two for TGEV. Moreover, 
three potential O-glycosylation sites are located in the 
putatively external N-terminus of the polypeptide (Ser- 
2; Thr-7; andThr-12). In addition, there is only one cys¬ 
teine residue (Cys-6). Other coronavirus M proteins 
contain two (bovine coronavirus, BOV; Ref. [21)), four 
(MF1V-A59 and JFIM; Refs. [19] and {22), respectively), 
eight (TGEV; Ref. {20)), or nine (infectious bronchitis vi¬ 
rus, IBV; Ref. (23)) cysteine residues. This cysteine resi¬ 
due is probably important in forming interchain disul¬ 
fide bridges, since M of HCV-229E has been shown to 
form oligomers under nonreducing conditions (14). 

No significant nucleotide sequence homology exists 
between the M genes of HCV-229E and other corona- 
viruses. The highest M amino acid homology (38% or 
100 of 262 residues) occurs between HCV-229E and 
TGEV. which was reported to be antigenically related 
(24). Antigenically distinct BCV, MHV, and IBV show 
amino acid homologies of 32, 30, and 28%, respec¬ 
tively. In contrast, a homology of 87% was found be¬ 
tween the M proteins of BCV and MHV-A59(27), which 
belong to another antigenic subgroup (24). On the 
other hand, a homology of 34% was found between 
the M protein of TGEV and BCV (25), which belong to 
two different antigenic subgroups. Figure 3 illustrates 
the M regions common to both HCV-229E and TGEV. 

As with other coronaviruses, the M protein of HCV- 
229E is a highly hydrophobic membrane protein. It con¬ 
tains 51% hydrophobic residues, compared to 45- 
51% for other coronaviruses (19-23, 25). The hydro- 
pathicity profiles of M proteins from HCV-229E, TGEV, 
BCV, MHV-JHM, MHV-A59, and IBV are presented in 
Fig. 4. The main features characterizing these M pro¬ 
teins include three large hydrophobic domains alter¬ 
nating with short hydrophilic regions. This suggests a 
selective pressure to maintain the potential transmem- 
branous domains of this coronavirus protein. As with 
other coronaviruses (26), only about 10% (20 amino 
acids) of the FICV-229E M protein constitutes the hy¬ 
drophilic putative external domain. On the other hand, 
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HCV-229E -MSNDNC-T-GDIV 

TGEV MKILLILACVIACACGERYCAMKSDTDLSCRNSTASDC—ESCFNGGDLI 50 


THLKNWNFGWNVILTIFIVILQFGHYKYSRLLYGLKMLVLWLLWPLVLAL 
WHL ANWNF SWSIILIVFITVLQYGRPQFSWFVYGIKMLI MWLLWPVVLAL 100 

SIFDTWANWD-SNWAFVAFSLLMAVSTLVMWVMYFANSFRLFRRARTFWA 
TIFNAYSEYQVSRYVMFGFSIAGAIVTFVLWIMYFVRSIQLYRRTNSWWS 150 

WNPEVNAITVTTVLGQTYYQPIQQAPTGITVTLLSGVLYVDGHRLASGVQ 
FNPETKAILCVSALGRSYVLPLEGVPTGVTL TLLSGN LYAEGFKIADGMN 200 

VHNLPEYMTVAVPSTTIIYSRVGRSVNSQNSTGWVFYVRVKHGDFSAVSS 
IDNLPKYVMVALPSRTIVYTLVGKKLKASSATGWAYYVKSKAGDYST-EA 250 

PMSNMTENERLLHFF 

RTDNLSEQEKLLH-MV 266 

Fig. 3. Comparison of the predicted amino acid sequences of M proteins of HCV-229E (top row) and TGEV (bottom row) aligned for maximum 
homology. Regions common to both proteins are underlined. The analysis was performed on an Apple Macintosh Plus computer with the 
MacGene Plus program (Applied Genetic Technology Inc., Fairview Park. OH). 
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MHV-JHM 


MHV-A59 
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the large N-terminal putative signal sequence found 
only in the M protein of TGEV {20, 25) is not observed 
in HCV-229E, which is similar to the structure reported 
for BCV, MHV-JHM, MHV-A59, and IBV. 

The coronavirus M protein is important for several 
reasons. This membrane protein is implicated in virus 
assembly and is believed to integrate the viral proteins 
prior to budding, most likely because of this protein’s 
affinity for RNA {26). Moreover, some monoclonal anti¬ 
bodies against the M protein of MHV-JHM are protec¬ 
tive in vivo and thus may influence the outcome of dis¬ 
ease {27). We are currently pursuing the cloning and 
sequencing of other genes of HCV-229E, with empha¬ 
sis on the gene coding for the spike protein S, which is 
potentially important in viral pathogenicity. The avail¬ 
ability of molecular probes for human coronavirus 
genes opens new avenues for the verification of the 
potential involvement of these viruses in neurological 
diseases. 


Fig. 4. Hydropathicity profiles of M proteins from HCV-229E, 
TGEV, BCV, MHV-JHM, MHV-A59, and IBV determined according to 
Kyte and Doolittle (28). The analysis was performed with the Mac- 
Gene Plus program as described in the legend to Fig. 3. Each point 
is the mean hydropathicity of a span of seven residues. Peaks ex¬ 
tending upwards correspond to hydrophobic regions and peaks ex¬ 
tending downwards to hydrophilic areas. 
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